The surrogate loss of variational autoencoders (VAEs) poses various challenges to their training, inducing the imbalance between task fitting and representation inference. To avert this, the existing strategies for VAEs focus on adjusting the tradeoff by introducing hyperparameters, deriving a tighter bound under some mild assumptions, or decomposing the loss components per certain neural settings. VAEs still suffer from uncertain tradeoff learning.We propose a novel evolutionary variational autoencoder (eVAE) building on the variational information bottleneck (VIB) theory and integrative evolutionary neural learning. eVAE integrates a variational genetic algorithm into VAE with variational evolutionary operators including variational mutation, crossover, and evolution. Its inner-outer-joint training mechanism synergistically and dynamically generates and updates the uncertain tradeoff learning in the evidence lower bound (ELBO) without additional constraints. Apart from learning a lossy compression and representation of data under the VIB assumption, eVAE presents an evolutionary paradigm to tune critical factors of VAEs and deep neural networks and addresses the premature convergence and random search problem by integrating evolutionary optimization into deep learning. Experiments show that eVAE addresses the KL-vanishing problem for text generation with low reconstruction loss, generates all disentangled factors with sharp images, and improves the image generation quality,respectively. eVAE achieves better reconstruction loss, disentanglement, and generation-inference balance than its competitors.
translated by 谷歌翻译
Score-based diffusion models have captured widespread attention and funded fast progress of recent vision generative tasks. In this paper, we focus on diffusion model backbone which has been much neglected before. We systematically explore vision Transformers as diffusion learners for various generative tasks. With our improvements the performance of vanilla ViT-based backbone (IU-ViT) is boosted to be on par with traditional U-Net-based methods. We further provide a hypothesis on the implication of disentangling the generative backbone as an encoder-decoder structure and show proof-of-concept experiments verifying the effectiveness of a stronger encoder for generative tasks with ASymmetriC ENcoder Decoder (ASCEND). Our improvements achieve competitive results on CIFAR-10, CelebA, LSUN, CUB Bird and large-resolution text-to-image tasks. To the best of our knowledge, we are the first to successfully train a single diffusion model on text-to-image task beyond 64x64 resolution. We hope this will motivate people to rethink the modeling choices and the training pipelines for diffusion-based generative models.
translated by 谷歌翻译
Understanding when and how much a model gradient leaks information about the training sample is an important question in privacy. In this paper, we present a surprising result: even without training or memorizing the data, we can fully reconstruct the training samples from a single gradient query at a randomly chosen parameter value. We prove the identifiability of the training data under mild conditions: with shallow or deep neural networks and a wide range of activation functions. We also present a statistically and computationally efficient algorithm based on tensor decomposition to reconstruct the training data. As a provable attack that reveals sensitive training data, our findings suggest potential severe threats to privacy, especially in federated learning.
translated by 谷歌翻译
Image super-resolution is a common task on mobile and IoT devices, where one often needs to upscale and enhance low-resolution images and video frames. While numerous solutions have been proposed for this problem in the past, they are usually not compatible with low-power mobile NPUs having many computational and memory constraints. In this Mobile AI challenge, we address this problem and propose the participants to design an efficient quantized image super-resolution solution that can demonstrate a real-time performance on mobile NPUs. The participants were provided with the DIV2K dataset and trained INT8 models to do a high-quality 3X image upscaling. The runtime of all models was evaluated on the Synaptics VS680 Smart Home board with a dedicated edge NPU capable of accelerating quantized neural networks. All proposed solutions are fully compatible with the above NPU, demonstrating an up to 60 FPS rate when reconstructing Full HD resolution images. A detailed description of all models developed in the challenge is provided in this paper.
translated by 谷歌翻译
高性能深度学习方法通​​常依赖于大型注释培训数据集,由于医疗图像标签的高成本,在许多临床应用中很难获得。现有的数据评估方法通常需要事先了解标签,而这些标签是不可行的,以实现“知道要标记的数据”的目标。为此,我们制定并提出了一种新颖有效的数据评估策略,指数边缘奇异值(检查)得分,以根据通过自我求助的学习(SSL)网络提取的有用的潜在表示,对未标记的医学图像数据进行排名。 。由SSL嵌入空间的理论含义激励,我们利用蒙版的自动编码器进行特征提取。此外,在排除数据集中的数据点之后,我们根据最大奇异值的边际变化评估数据质量。我们对病理数据集进行了广泛的实验。我们的结果表明,我们提出的方法选择最有价值的数据的有效性和效率。
translated by 谷歌翻译
基于深度学习的单图像超分辨率(SISR)方法引起了人们的关注,并在现代高级GPU上取得了巨大的成功。但是,大多数最先进的方法都需要大量参数,记忆和计算资源,这些参数通常会显示在当前移动设备CPU/NPU上时显示出较低的推理时间。在本文中,我们提出了一个简单的普通卷积网络,该网络具有快速最近的卷积模块(NCNET),该模块对NPU友好,可以实时执行可靠的超级分辨率。提出的最近的卷积具有与最近的UP采样相同的性能,但更快,更适合Android NNAPI。我们的模型可以很容易地在具有8位量化的移动设备上部署,并且与所有主要的移动AI加速器完全兼容。此外,我们对移动设备上的不同张量操作进行了全面的实验,以说明网络体系结构的效率。我们的NCNET在DIV2K 3X数据集上进行了训练和验证,并且与其他有效的SR方法的比较表明,NCNET可以实现高保真SR结果,同时使用更少的推理时间。我们的代码和预估计的模型可在\ url {https://github.com/algolzw/ncnet}上公开获得。
translated by 谷歌翻译
本文回顾了AIM 2022上压缩图像和视频超级分辨率的挑战。这项挑战包括两条曲目。轨道1的目标是压缩图像的超分辨率,轨迹〜2靶向压缩视频的超分辨率。在轨道1中,我们使用流行的数据集DIV2K作为培训,验证和测试集。在轨道2中,我们提出了LDV 3.0数据集,其中包含365个视频,包括LDV 2.0数据集(335个视频)和30个其他视频。在这一挑战中,有12支球队和2支球队分别提交了赛道1和赛道2的最终结果。所提出的方法和解决方案衡量了压缩图像和视频上超分辨率的最先进。提出的LDV 3.0数据集可在https://github.com/renyang-home/ldv_dataset上找到。此挑战的首页是在https://github.com/renyang-home/aim22_compresssr。
translated by 谷歌翻译
因果推论已成为处理分布外(OOD)概括问题的强大工具,该问题旨在提取不变特征。但是,常规方法从多个数据拆分中应用因果学习者,这可能会从数据分布中产生偏见的表示学习,并且在异质源中不变特征学习中的难度。为了解决这些问题,本文介绍了平衡的元考生学习者(BMCL),其中包括平衡的任务生成模块(BTG)和元伴侣特征学习模块(MCFL)。具体而言,BTG模块学会通过一种自我学习的分区算法来生成平衡子集,该算法对示例类和上下文的比例有限制。 MCFL模块训练一个适合不同分布的元学习者。在NICO ++数据集上进行的实验验证了BMCL有效地标识了类不变的视觉区域进行分类,并可以作为改善最先进方法的性能的一般框架。
translated by 谷歌翻译
在这项工作中,我们重新审视了弱到较强的一致性框架,该框架由半监视分类的FixMatch推广,在该分类中,对弱扰动的图像的预测可作为其强烈扰动版本的监督。有趣的是,我们观察到,这种简单的管道已经转移到我们的细分方案时已经在最近的高级工作中取得了竞争成果。它的成功在很大程度上依赖于强大数据增强的手动设计,但是,这可能是有限的,并且不足以探索更广泛的扰动空间。在此激励的情况下,我们提出了一个辅助特征扰动流作为补充,从而导致了扩大的扰动空间。另一方面,为了充分探测原始的图像级增强,我们提出了一种双流扰动技术,从而使两个强大的观点能够同时受到共同的弱视图的指导。因此,我们整体统一的双流扰动方法(Unipatch)在Pascal,CityScapes和Coco基准的所有评估方案中都显着超过所有现有方法。我们还证明了我们方法在遥感解释和医学图像分析中的优越性。代码可从https://github.com/liheyoung/unimatch获得。
translated by 谷歌翻译
域的概括(DG)旨在学习一个对源域的模型,以很好地概括看不见的目标域。尽管它取得了巨大的成功,但大多数现有方法都需要用于源域中所有培训样本的标签信息,这在现实世界中既耗时又昂贵。在本文中,我们求助于解决半监督域的概括(SSDG)任务,其中每个源域中都有一些标签信息。为了解决任务,我们首先分析多域学习的理论,该理论强调了1)减轻域间隙的影响和2)利用所有样品训练模型可以有效地减少每个源域中的概括误差,因此提高伪标签的质量。根据分析,我们提出了Multimatch,即将FixMatch扩展到多任务学习框架,从而为SSDG生成高质量的伪标签。具体来说,我们将每个培训域视为一个任务(即本地任务),并将所有培训域(即全球任务)组合在一起,以训练看不见的测试域的额外任务。在多任务框架中,我们为每个任务使用独立的BN和分类器,这可以有效地减轻伪标记期间不同领域的干扰。同样,共享框架中的大多数参数,可以通过所有培训样本进行培训。此外,为了进一步提高伪标签的准确性和模型的概括,我们分别在培训和测试过程中分别融合了全球任务和本地任务的预测。一系列实验验证了所提出的方法的有效性,并且在几个基准DG数据集上优于现有的半监督方法和SSDG方法。
translated by 谷歌翻译